user command
Robot Trajectron V2: A Probabilistic Shared Control Framework for Navigation
Song, Pinhao, Du, Yurui, Saussus, Ophelie, De Schrijver, Sofie, Caprara, Irene, Janssen, Peter, Detry, Renaud
We propose a probabilistic shared-control solution for navigation, called Robot Trajectron V2 (RT-V2), that enables accurate intent prediction and safe, effective assistance in human-robot interaction. RT-V2 jointly models a user's long-term behavioral patterns and their noisy, low-dimensional control signals by combining a prior intent model with a posterior update that accounts for real-time user input and environmental context. The prior captures the multimodal and history-dependent nature of user intent using recurrent neural networks and conditional variational autoencoders, while the posterior integrates this with uncertain user commands to infer desired actions. We conduct extensive experiments to validate RT-V2 across synthetic benchmarks, human-computer interaction studies with keyboard input, and brain-machine interface experiments with non-human primates. Results show that RT-V2 outperforms the state of the art in intent estimation, provides safe and efficient navigation support, and adequately balances user autonomy with assistive intervention. By unifying probabilistic modeling, reinforcement learning, and safe optimization, RT-V2 offers a principled and generalizable approach to shared control for diverse assistive technologies.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- North America > United States > Utah (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
How Can LLMs and Knowledge Graphs Contribute to Robot Safety? A Few-Shot Learning Approach
Althobaiti, Abdulrahman, Ayala, Angel, Gao, JingYing, Almutairi, Ali, Deghat, Mohammad, Razzak, Imran, Cruz, Francisco
Large Language Models (LLMs) are transforming the robotics domain by enabling robots to comprehend and execute natural language instructions. The cornerstone benefits of LLM include processing textual data from technical manuals, instructions, academic papers, and user queries based on the knowledge provided. However, deploying LLM-generated code in robotic systems without safety verification poses significant risks. This paper outlines a safety layer that verifies the code generated by ChatGPT before executing it to control a drone in a simulated environment. The safety layer consists of a fine-tuned GPT-4o model using Few-Shot learning, supported by knowledge graph prompting (KGP). Our approach improves the safety and compliance of robotic actions, ensuring that they adhere to the regulations of drone operations.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Brazil > Pernambuco > Recife (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.88)
From Words to Wheels: Automated Style-Customized Policy Generation for Autonomous Driving
Han, Xu, Chen, Xianda, Cai, Zhenghan, Cai, Pinlong, Zhu, Meixin, Chu, Xiaowen
Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitations in delivering command-based driving styles. Most existing methods either rely on predefined driving styles that require expert input or use data-driven techniques like Inverse Reinforcement Learning to extract styles from driving data. These approaches, though effective in some cases, face challenges: difficulty obtaining specific driving data for style matching (e.g., in Robotaxis), inability to align driving style metrics with user preferences, and limitations to pre-existing styles, restricting customization and generalization to new commands. This paper introduces Words2Wheels, a framework that automatically generates customized driving policies based on natural language user commands. Words2Wheels employs a Style-Customized Reward Function to generate a Style-Customized Driving Policy without relying on prior driving data. By leveraging large language models and a Driving Style Database, the framework efficiently retrieves, adapts, and generalizes driving styles. A Statistical Evaluation module ensures alignment with user preferences. Experimental results demonstrate that Words2Wheels outperforms existing methods in accuracy, generalization, and adaptability, offering a novel solution for customized AV driving behavior. Code and demo available at https://yokhon.github.io/Words2Wheels/.
- Asia > China > Guangdong Province > Guangzhou (0.05)
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (3 more...)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Goal Estimation-based Adaptive Shared Control for Brain-Machine Interfaces Remote Robot Navigation
Muraoka, Tomoka, Aoki, Tatsuya, Hirata, Masayuki, Taniguchi, Tadahiro, Horii, Takato, Nagai, Takayuki
Goal Estimation-based Adaptive Shared Control for Brain-Machine Interfaces Remote Robot Navigation Tomoka Muraoka 1 Tatsuya Aoki 1 Masayuki Hirata 2 Tadahiro Taniguchi 3 Takato Horii 1 and Takayuki Nagai 1 Abstract -- In this study, we propose a shared control method for teleoperated mobile robots using brain-machine interfaces (BMI). The control commands generated through BMI for robot operation face issues of low input frequency, discreteness, and uncertainty due to noise. T o address these challenges, our method estimates the user's intended goal from their commands and uses this goal to generate auxiliary commands through the autonomous system that are both at a higher input frequency and more continuous. Furthermore, by defining the confidence level of the estimation, we adaptively calculated the weights for combining user and autonomous commands, thus achieving shared control. We conducted navigation experiments in both simulated environments and participant experiments in real environments including user ratings, using a pseudo-BMI setup. As a result, the proposed method significantly reduced obstacle collisions in all experiments. It markedly shortened path lengths under almost all conditions in simulations and, in participant experiments, especially when user inputs become more discrete and noisy (p < 0.01). Furthermore, under such challenging conditions, it was demonstrated that users could operate more easily, with greater confidence, and at a comfortable pace through this system. I. INTRODUCTION The potential of brain-machine interfaces (BMI) to enable remote control of robots offers significant opportunities for enhancing social participation among individuals with physical disabilities. This is achieved by providing essential navigation capabilities.
- Research Report > Experimental Study (0.88)
- Research Report > New Finding (0.70)
Thoughtful Things: Building Human-Centric Smart Devices with Small Language Models
King, Evan, Yu, Haoxiang, Vartak, Sahil, Jacob, Jenna, Lee, Sangsu, Julien, Christine
Everyday devices like light bulbs and kitchen appliances are now embedded with so many features and automated behaviors that they have become complicated to actually use. While such "smart" capabilities can better support users' goals, the task of learning the "ins and outs" of different devices is daunting. Voice assistants aim to solve this problem by providing a natural language interface to devices, yet such assistants cannot understand loosely-constrained commands, they lack the ability to reason about and explain devices' behaviors to users, and they rely on connectivity to intrusive cloud infrastructure. Toward addressing these issues, we propose thoughtful things: devices that leverage lightweight, on-device language models to take actions and explain their behaviors in response to unconstrained user commands. We propose an end-to-end framework that leverages formal modeling, automated training data synthesis, and generative language models to create devices that are both capable and thoughtful in the presence of unconstrained user goals and inquiries. Our framework requires no labeled data and can be deployed on-device, with no cloud dependency. We implement two thoughtful things (a lamp and a thermostat) and deploy them on real hardware, evaluating their practical performance.
- North America > United States > Texas > Travis County > Austin (0.29)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Overview (0.92)
- Research Report (0.82)
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Cai, Yuzhu, Yin, Sheng, Wei, Yuxi, Xu, Chenxin, Mao, Weibo, Juefei-Xu, Felix, Chen, Siheng, Wang, Yanfeng
The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools without necessitating internal model revision. Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions by refining user commands and rectifying model outputs. Systematic evaluation metrics, combining GPT4-V, HEIM, and FairFace scores, assess alignment capability. Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models like DALLE 3, ensuring user-generated content adheres to ethical standards while maintaining image quality. This study indicates the potential of Ethical-Lens to ensure the sustainable development of open-source text-to-image tools and their beneficial integration into society. Our code is available at https://github.com/yuzhu-cai/Ethical-Lens.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- (2 more...)
- Research Report (1.00)
- Instructional Material > Online (0.61)
- Instructional Material > Course Syllabus & Notes (0.61)
- Media (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- (5 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
New AI test measures how fast robots can respond to user commands
WEHEAD connects to ChatGPT and displays a face, expressions and voice. Artificial intelligence benchmarking group MLCommons on Wednesday released a fresh set of tests and results that rate the speed at which top-of-the-line hardware can run AI applications and respond to users. The two new benchmarks added by MLCommons measure the speed at which the AI chips and systems can generate responses from the powerful AI models packed with data. The results roughly demonstrate to how quickly an AI application such as ChatGPT can deliver a response to a user query. One of the new benchmarks added the capability to measure the speediness of a question-and-answer scenario for large language models.
Robot Trajectron: Trajectory Prediction-based Shared Control for Robot Manipulation
Song, Pinhao, Li, Pengteng, Aertbelien, Erwin, Detry, Renaud
We address the problem of (a) predicting the trajectory of an arm reaching motion, based on a few seconds of the motion's onset, and (b) leveraging this predictor to facilitate shared-control manipulation tasks, easing the cognitive load of the operator by assisting them in their anticipated direction of motion. Our novel intent estimator, dubbed the \emph{Robot Trajectron} (RT), produces a probabilistic representation of the robot's anticipated trajectory based on its recent position, velocity and acceleration history. Taking arm dynamics into account allows RT to capture the operator's intent better than other SOTA models that only use the arm's position, making it particularly well-suited to assist in tasks where the operator's intent is susceptible to change. We derive a novel shared-control solution that combines RT's predictive capacity to a representation of the locations of potential reaching targets. Our experiments demonstrate RT's effectiveness in both intent estimation and shared-control tasks. We will make the code and data supporting our experiments publicly available at https://github.com/mousecpn/Robot-Trajectron.git.
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
Sasha: Creative Goal-Oriented Reasoning in Smart Homes with Large Language Models
King, Evan, Yu, Haoxiang, Lee, Sangsu, Julien, Christine
Smart home assistants function best when user commands are direct and well-specified (e.g., "turn on the kitchen light"), or when a hard-coded routine specifies the response. In more natural communication, however, human speech is unconstrained, often describing goals (e.g., "make it cozy in here" or "help me save energy") rather than indicating specific target devices and actions to take on those devices. Current systems fail to understand these under-specified commands since they cannot reason about devices and settings as they relate to human situations. We introduce large language models (LLMs) to this problem space, exploring their use for controlling devices and creating automation routines in response to under-specified user commands in smart homes. We empirically study the baseline quality and failure modes of LLM-created action plans with a survey of age-diverse users. We find that LLMs can reason creatively to achieve challenging goals, but they experience patterns of failure that diminish their usefulness. We address these gaps with Sasha, a smarter smart home assistant. Sasha responds to loosely-constrained commands like "make it cozy" or "help me sleep better" by executing plans to achieve user goals, e.g., setting a mood with available devices, or devising automation routines. We implement and evaluate Sasha in a hands-on user study, showing the capabilities and limitations of LLM-driven smart homes when faced with unconstrained user-generated scenarios.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > New York (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Questionnaire & Opinion Survey (1.00)
- Research Report > New Finding (0.92)
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Yang, Yi, Zhang, Qingwen, Li, Ci, Marta, Daniel Simões, Batool, Nazre, Folkesson, John
The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality. However, a human-centric large-scale adoption hinges on meeting a variety of multifaceted requirements. To ensure that the autonomous system meets the user's intent, it is essential to accurately discern and interpret user commands, especially in complex or emergency situations. To this end, we propose to leverage the reasoning capabilities of Large Language Models (LLMs) to infer system requirements from in-cabin users' commands. Through a series of experiments that include different LLM models and prompt designs, we explore the few-shot multivariate binary classification accuracy of system requirements from natural language textual commands. We confirm the general ability of LLMs to understand and reason about prompts but underline that their effectiveness is conditioned on the quality of both the LLM model and the design of appropriate sequential prompts. Code and models are public with the link \url{https://github.com/KTH-RPL/DriveCmd_LLM}.
- Information Technology (0.69)
- Transportation > Ground > Road (0.36)